26 research outputs found

    Efficient Code for Second Order Analysis of Events on a Linear Network

    Get PDF
    We describe efficient algorithms and open-source code for the second-order statistical analysis of point events on a linear network. Typical summary statistics are adaptations of Ripley's K-function and the pair correlation function to the case of a linear network, with distance measured by the shortest path in the network. Simple implementations consume substantial time and memory. For an efficient implementation, the data structure representing the network must be economical in its use of memory, but must also enable rapid searches to be made. We have developed such an efficient implementation in C with an R interface written as an extension to the R package spatstat. The algorithms handle realistic large networks, as we demonstrate using a database of all road accidents recorded in Western Australia

    Variable selection using penalised likelihoods for point patterns on a linear network.

    Get PDF
    Motivated by the analysis of a comprehensive database of road traffic accidents, we investigate methods of variable selection for spatial point process models on a linear network. The original data may include explanatory spatial covariates, such as road curvature, and ā€˜markā€™ variables attributed to individual accidents, such as accident severity. The treatment of mark variables is new. Variable selection is applied to the canonical covariates, which may include spatial covariate effects, mark effects and mark-covariate interactions. We approximate the likelihood of the point process model by that of a generalised linear model, in such a way that spatial covariates and marks are both associated with canonical covariates. We impose a convex penalty on the log likelihood, principally the elastic-net penalty, and maximise the penalised loglikelihood by cyclic coordinate ascent. A simulation study compares the performances of the lasso, ridge regression and elastic-net methods of variable selection on their ability to select variables correctly, and on their bias and standard error. Standard techniques for selecting the regularisation parameter Ī³ often yielded unsatisfactory results. We propose two new rules for selecting Ī³ which are designed to have better performance. The methods are tested on a small dataset on crimes in a Chicago neighbourhood, and applied to a large dataset of road traffic accidents in Western Australia

    Analysing point patterns on networks ā€” A review

    Get PDF
    We review recent research on statistical methods for analysing spatial patterns of points on a network of lines, such as road accident locations along a road network. Due to geometrical complexities, the analysis of such data is extremely challenging, and we describe several common methodological errors. The intrinsic lack of homogeneity in a network militates against the traditional methods of spatial statistics based on stationary processes. Topics include kernel density estimation, relative risk estimation, parametric and non-parametric modelling of intensity, second-order analysis using the K-function and pair correlation function, and point process model construction. An important message is that the choice of distance metric on the network is pivotal in the theoretical development and in the analysis of real data. Challenges for statistical computation are discussed and open-source software is provided

    Diffusion Smoothing for Spatial Point Patterns

    Get PDF
    Traditional kernel methods for estimating the spatially-varying density of points in a spatial point pattern may exhibit unrealistic artefacts,in addition to the familiar problems of bias and over or under-smoothing.Performance can be improved by using diffusion smoothing, in which thesmoothing kernel is the heat kernel on the spatial domain. This paper developsdiffusion smoothing into a practical statistical methodology for twodimensionalspatial point pattern data. We clarify the advantages and disadvantagesof diffusion smoothing over Gaussian kernel smoothing. Adaptivesmoothing, where the smoothing bandwidth is spatially-varying, can beperformed by adopting a spatially-varying diffusion rate: this avoids technicalproblems with adaptive Gaussian smoothing and has substantially betterperformance. We introduce a new form of adaptive smoothing using laggedarrival times, which has good performance and improved robustness. Applicationsin archaeology and epidemiology are demonstrated. The methods areimplemented in open-source R cod

    Hierarchical clustering of MS/MS spectra from the firefly metabolome identifies new lucibufagin compounds

    Get PDF
    Metabolite identification is the greatest challenge when analysing metabolomics data, as only a small proportion of metabolite reference standards exist. Clustering MS/MS spectra is a common method to identify similar compounds, however interrogation of underlying signature fragmentation patterns within clusters can be problematic. Previously published high-resolution LC-MS/MS data from the bioluminescent beetle (Photinus pyralis) provided an opportunity to mine new specialized metabolites in the lucibufagin class, compounds important for defense against predation. We aimed to 1) provide a workflow for hierarchically clustering MS/MS spectra for metabolomics data enabling users to cluster, visualise and easily interrogate the identification of underlying cluster ion profiles, and 2) use the workflow to identify key fragmentation patterns for lucibufagins in the hemolymph of P. pyralis. Features were aligned to their respective MS/MS spectra, then product ions were dynamically binned and resulting spectra were hierarchically clustered and grouped based on a cutoff distance threshold. Using the simplified visualization and the interrogation of cluster ion tables the number of lucibufagins was expanded from 17 to a total of 29

    Reutilization of waste fungal biomass for concomitant production of proteochitinolytic enzymes and their catalytic products by Alcaligenes faecalis SK10

    Get PDF
    Fungal biomass, being organic waste, could be an excellent source of protein, carbohydrate and minerals. However, it has not been exploited fully until now. Efficient management of this waste can not only address the environmental impact on its disposal but also yield value-added metabolites. In the present study, in order to explore its potential, we subjected dead fungal biomass of Aspergillus niger SKN1 as substrate for both fermentative and enzymatic biodegradation, respectively by potent proteo-chitinolytic bacteria Alcaligenes faecalis SK10 and its enzyme cocktail. The results revealed that reasonable amount of protease and chitinase could be biosynthesized by the fermentative mode of utilization, while a mixture of amino acid, peptides and low-molecular weight amino-sugar (mono and oligomeric form of N-acetylglucosamine) could be generated through enzymatic hydrolysis. The physicochemical condition of both the bioprocess was subsequently optimized through statistical approach. The projected utilization of waste zero-valued fungal biomass offer a sustainable and environmentally sound method for production of microbial metabolites and large scale execution of the same could be proficient and in tune with the principle of circular economy

    Comparison of nonparametric Lorenz curves and regression functions under inequality constraints

    No full text
    Comparison of two populations with respect to their income inequalities is an important topic in economic and social studies. The Lorenz curve is a useful measure of income inequality. The Lorenz ordinate L(x) at x (0ā‰¤ x ā‰¤1) is deļ¬ned as the proportion of the total income that is owned by the lowest-earning 100x percent of people. It turns out that L(x) is an increasing convex function deļ¬ned in the interval [0,1], with L(0) = 0 and L(1) = 1. If the Lorenz curve of population 1 never lies below that of population 2, then population 1 displays less income inequality than population 2, and we say that population 1 Lorenz dominates population 2. In general, income inequality studies are performed under the assumption that the income distribution has a known parametric form. By contrast, in this thesis we allow the income distribution to take any arbitrary shape. Since, in practice, the income distribution is not known, the methods that we propose are based on more realistic assumptions. The majority of the Lorenz dominance tests proposed in the literature compare two Lorenz curves at a ļ¬nite number of points on the x-axis. Recently, Bhattacharya (2007) proposed a Lorenz dominance test that compares two Lorenz curves on the entire x-axis, without using any scale function. In Chapter 3 of this thesis, a consistent test is developed for Lorenz dominance, based on a class of weight functions. The hypotheses for the test are formulated by comparing two Lorenz curves over the entire x-axis. A simulation study is performed to compare the performance of our proposed test with that of Bhattacharyaā€™s (2007) test. The results of this study show that our scale-based test performs better than the test without a scale function. If the Lorenz curves for two populations do not intersect, then Lorenz dominance is easy to interpret and provides a simple way of ranking income distributions with respect to income inequality. However, experience shows that Lorenz curves often do cross each other, and hence, the ranking of populations based on Lorenz dominance is impossible. A common practice for ranking populations when the Lorenz curves intersect is to use summary measĀ¬ures of inequality such as the Gini coefļ¬cient. However, any ranking based on a single inequality measure will have limitations. In Chapter 4, we introduce a new testing procedure for ranking two income distributions when the corresponding Lorenz curves may intersect. The idea is to allow the Lorenz curves to cross, but to impose restrictions on the possible difference between them. To this end, we will introduce the ideas of non-inferiority and superiority in order to reformulate the inference problem. The testing problem is formed by comparing two Lorenz curves at a ļ¬nite number of points on the x-axis. We test against the hypothesis that the Lorenz curve of population 2 does not lie below that of population 1 by more than a speciļ¬ed margin (non-inferiority margin) at any point on the x-axis, and that Lorenz curve of population 2 lies above that of population 1 by more than a speciļ¬ed margin (superiority margin) for at least one point on the x-axis. A bootstrap algorithm is proposed to implement the test. In a simulation study, we observe that the type-I error rates for the test are close to the nominal level. We then illustrate the usefulness of our proposed method using an empirical example. Finally, in Chapter 5, tests for detecting differences between two univariate nonparametric regression curves are developed. The aim with this new method is to establish that one treatment is not inferior to another for the whole population, and also that it is superior for at least a part of the population, when the treatment effect is represented by a nonparametric regression curve. The inference problem is formulated as a test against the alternative hypoĀ¬thesis (a) that the regression curve for population 1 does not fall below that for population 2 by more than a speciļ¬ed small amount, at any value of the covariate, and (b) that the former exceeds the latter, at some values of the covariate, by more than a speciļ¬ed amount. The test statistic is easy to compute, and tables of asymptotic critical values are also provided. Because the asymptotic test is conservative, a less conservative bootstrap test is proposed and is shown to be asymptotically valid. In a simulation study, we observe that the type-I error rates for these tests are close to the nominal level, and that the bootstrap test exhibits a higher estimated power, as expected

    Comparison of nonparametric Lorenz curves and regression functions under inequality constraints

    No full text
    Comparison of two populations with respect to their income inequalities is an important topic in economic and social studies. The Lorenz curve is a useful measure of income inequality. The Lorenz ordinate L(x) at x (0ā‰¤ x ā‰¤1) is deļ¬ned as the proportion of the total income that is owned by the lowest-earning 100x percent of people. It turns out that L(x) is an increasing convex function deļ¬ned in the interval [0,1], with L(0) = 0 and L(1) = 1. If the Lorenz curve of population 1 never lies below that of population 2, then population 1 displays less income inequality than population 2, and we say that population 1 Lorenz dominates population 2. In general, income inequality studies are performed under the assumption that the income distribution has a known parametric form. By contrast, in this thesis we allow the income distribution to take any arbitrary shape. Since, in practice, the income distribution is not known, the methods that we propose are based on more realistic assumptions. The majority of the Lorenz dominance tests proposed in the literature compare two Lorenz curves at a ļ¬nite number of points on the x-axis. Recently, Bhattacharya (2007) proposed a Lorenz dominance test that compares two Lorenz curves on the entire x-axis, without using any scale function. In Chapter 3 of this thesis, a consistent test is developed for Lorenz dominance, based on a class of weight functions. The hypotheses for the test are formulated by comparing two Lorenz curves over the entire x-axis. A simulation study is performed to compare the performance of our proposed test with that of Bhattacharyaā€™s (2007) test. The results of this study show that our scale-based test performs better than the test without a scale function. If the Lorenz curves for two populations do not intersect, then Lorenz dominance is easy to interpret and provides a simple way of ranking income distributions with respect to income inequality. However, experience shows that Lorenz curves often do cross each other, and hence, the ranking of populations based on Lorenz dominance is impossible. A common practice for ranking populations when the Lorenz curves intersect is to use summary measĀ¬ures of inequality such as the Gini coefļ¬cient. However, any ranking based on a single inequality measure will have limitations. In Chapter 4, we introduce a new testing procedure for ranking two income distributions when the corresponding Lorenz curves may intersect. The idea is to allow the Lorenz curves to cross, but to impose restrictions on the possible difference between them. To this end, we will introduce the ideas of non-inferiority and superiority in order to reformulate the inference problem. The testing problem is formed by comparing two Lorenz curves at a ļ¬nite number of points on the x-axis. We test against the hypothesis that the Lorenz curve of population 2 does not lie below that of population 1 by more than a speciļ¬ed margin (non-inferiority margin) at any point on the x-axis, and that Lorenz curve of population 2 lies above that of population 1 by more than a speciļ¬ed margin (superiority margin) for at least one point on the x-axis. A bootstrap algorithm is proposed to implement the test. In a simulation study, we observe that the type-I error rates for the test are close to the nominal level. We then illustrate the usefulness of our proposed method using an empirical example. Finally, in Chapter 5, tests for detecting differences between two univariate nonparametric regression curves are developed. The aim with this new method is to establish that one treatment is not inferior to another for the whole population, and also that it is superior for at least a part of the population, when the treatment effect is represented by a nonparametric regression curve. The inference problem is formulated as a test against the alternative hypoĀ¬thesis (a) that the regression curve for population 1 does not fall below that for population 2 by more than a speciļ¬ed small amount, at any value of the covariate, and (b) that the former exceeds the latter, at some values of the covariate, by more than a speciļ¬ed amount. The test statistic is easy to compute, and tables of asymptotic critical values are also provided. Because the asymptotic test is conservative, a less conservative bootstrap test is proposed and is shown to be asymptotically valid. In a simulation study, we observe that the type-I error rates for these tests are close to the nominal level, and that the bootstrap test exhibits a higher estimated power, as expected
    corecore